Reference implementation of Offline Permuted Policy Learning from "Provably Efficient Third-Person Imitation from Offline Observation" in UAI 2020.
Credit to Tim Vieira for the arsenal package, and to Tim Vieira and Kianté Brantley for the tabular RL testbed in the files {markovchain, mdp, mrp}.py (also at https://github.com/timvieira/rl)